Mapping the Natural Language Processing Domain: Experiments using the ACL Anthology
نویسندگان
چکیده
This paper investigates the evolution of the computational linguistics domain through a quantitative analysis of the ACL Anthology (containing around 12,000 papers published between 1985 and 2008). Our approach combines complex system methods with natural language processing techniques. We reconstruct the socio-semantic landscape of the domain by inferring a co-authorship and a semantic network from the analysis of the corpus. First, keywords are extracted using a hybrid approach mixing linguistic patterns with statistical information. Then, the semantic network is built using a co-occurrence analysis of these keywords within the corpus. Combining temporal and network analysis techniques, we are able to examine the main evolutions of the field and the more active subfields over time. Lastly we propose a model to explore the mutual influence of the social and the semantic network over time, leading to a socio-semantic co-evolutionary system.
منابع مشابه
The ACL Anthology Searchbench
We describe a novel application for structured search in scientific digital libraries. The ACL Anthology Searchbench is meant to become a publicly available research tool to query the content of the ACL Anthology. The application provides search in both its bibliographic metadata and semantically analyzed full textual content. By combining these two features, very efficient and focused queries ...
متن کاملThe ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics
The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of research results, but we believe that it can also be an object of study and a platform for research in its own right. We describe an enriched and standardized reference corpus derived from the ACL Antho...
متن کاملTowards an ACL Anthology Corpus with Logical Document Structure. An Overview of the ACL 2012 Contributed Task
The ACL 2012 Contributed Task is a community effort aiming to provide the full ACL Anthology as a high-quality corpus with rich markup, following the TEI P5 guidelines— a new resource dubbed the ACL Anthology Corpus (AAC). The goal of the task is threefold: (a) to provide a shared resource for experimentation on scientific text; (b) to serve as a basis for advanced search over the ACL Anthology...
متن کاملHe Said, She Said: Gender in the ACL Anthology
Studies of gender balance in academic computer science are typically based on statistics on enrollment and graduation. Going beyond these coarse measures of gender participation, we conduct a fine-grained study of gender in the field of Natural Language Processing. We use topic models (Latent Dirichlet Allocation) to explore the research topics of men and women in the ACL Anthology Network. We ...
متن کاملThe ACL Anthology Network Corpus as a Resource for NLP-based Bibliometrics
The ACL Anthology Network (AAN) is another successful project built on top of the ACL Anthology. It was started in 2007 by our group (CLAIR) (Radev et al., 2009) at the University of Michigan. Table 1 shows some statistics of the current release of AAN. We convert the articles included in the ACL Anthology corpus (excluding book reviews) from PDF to text. This text is then processed to identify...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014